Solution to fivethirtyeight Riddler’s puzzle, Can You Find The Fish In State Names? (May 22, 2020).
Ohio is the only state whose name doesn’t share any letters with the word “mackerel.” It’s strange, but it’s true.
But that isn’t the only pairing of a state and a word you can say that about — it’s not even the only fish! Kentucky has “goldfish” to itself, Montana has “jellyfish” and Delaware has “monkfish,” just to name a few.
What is the longest “mackerel?” That is, what is the longest word that doesn’t share any letters with exactly one state? (If multiple “mackerels” are tied for being the longest, can you find them all?)
Extra credit: Which state has the most “mackerels?” That is, which state has the most words for which it is the only state without any letters in common with those words?
(For both the Riddler and the extra credit, please refer to Friend of the Riddler™ Peter Norvig’s word list.)
I took a quick look at the word list, and we can easily handle the 260K combinations (1.3M) for the 50 states.
states_orig <- c(
"Alabama", "Alaska", "Arizona", "Arkansas", "California", "Colorado",
"Connecticut", "Delaware", "Florida", "Georgia", "Hawaii", "Idaho", "Illinois",
"Indiana", "Iowa", "Kansas", "Kentucky", "Louisiana", "Maine", "Maryland",
"Massachusetts", "Michigan", "Minnesota", "Mississippi", "Missouri", "Montana",
"Nebraska", "Nevada", "New Hampshire", "New Jersey", "New Mexico", "New York",
"North Carolina", "North Dakota", "Ohio", "Oklahoma", "Oregon", "Pennsylvania",
"Rhode Island", "South Carolina", "South Dakota", "Tennessee", "Texas", "Utah",
"Vermont", "Virginia", "Washington", "West Virginia", "Wisconsin", "Wyoming"
)
states_mod <- gsub(x = states_orig, " ", "") |> tolower()
names(states_mod) <- states_orig
words <- read.table("https://norvig.com/ngrams/word.list") |> pull(V1)
words = r.words
states_set = [ set(state) for state in r.states_mod ]
states = r.states_orig
mackerels = {
word: state_sel[0]
for word in words
# no common letters with any state?
if len(set(word).intersection(*states_set)) == 0
# finding the states that have no common letters with the word
and len(state_sel := [state for ix, state in enumerate(states) if len(set(word).intersection(states_set[ix])) == 0]) == 1
}
library(reticulate)
mackerels <- py$mackerels
d <- enframe(mackerels, name = "word", value = "state") |> unnest()
d |>
mutate(word_length = nchar(word)) |>
arrange(desc(word_length))
## # A tibble: 45,385 × 3
## word state word_length
## <chr> <chr> <int>
## 1 counterproductivenesses Alabama 23
## 2 hydrochlorofluorocarbon Mississippi 23
## 3 counterproductiveness Alabama 21
## 4 unconscientiousnesses Alabama 21
## 5 counterconditionings Alabama 20
## 6 deoxycorticosterones Alabama 20
## 7 expressionlessnesses Utah 20
## 8 hyperconsciousnesses Alabama 20
## 9 hypersensitivenesses Alabama 20
## 10 incompressiblenesses Utah 20
## # ℹ 45,375 more rows
The answer is Alabama with the word “counterproductivenesses”.
d |> count(state, sort = T)
## # A tibble: 32 × 2
## state n
## <chr> <int>
## 1 Ohio 11342
## 2 Alabama 8274
## 3 Utah 6619
## 4 Mississippi 4863
## 5 Hawaii 1763
## 6 Kentucky 1580
## 7 Wyoming 1364
## 8 Tennessee 1339
## 9 Alaska 1261
## 10 Nevada 1229
## # ℹ 22 more rows
The answer is Ohio!
Plot for comparison:
d |>
mutate(word_length = nchar(word)) |>
arrange(desc(word_length)) |>
count(state) |>
mutate(
state = fct_lump_n(state, n = 10, w = n),
state = fct_reorder(state, n, .desc = TRUE)
) |>
ggplot(aes(state, n)) +
geom_col() +
labs(title = "# Mackerels by State", x = "State", y = "Count")